Handling Concept Drift in a Text Data Stream Constrained by High Labelling Cost
نویسندگان
چکیده
In many real-world classification problems the concept being modelled is not static but rather changes over time a situation known as concept drift. Most techniques for handling concept drift rely on the true classifications of test instances being available shortly after classification so that classifiers can be retrained to handle the drift. However, in applications where labelling instances with their true class has a high cost this is not reasonable. In this paper we present an approach for keeping a classifier up-to-date in a concept drift domain which is constrained by a high cost of labelling. We use an active learning type approach to select those examples for labelling that are most useful in handling changes in concept. We show how this approach can adequately handle concept drift in a text filtering scenario requiring just 15% of the documents to be manually categorised and labelled.
منابع مشابه
Detecting Concept Drift in Data Stream Using Semi-Supervised Classification
Data stream is a sequence of data generated from various information sources at a high speed and high volume. Classifying data streams faces the three challenges of unlimited length, online processing, and concept drift. In related research, to meet the challenge of unlimited stream length, commonly the stream is divided into fixed size windows or gradual forgetting is used. Concept drift refer...
متن کاملAlgorithm to handle Concept Drifting in Data Stream Mining
Data Stream Mining is the evolving field of research. Mining continuous data streams brings unique opportunities but also new challenges. This paper will describe and evaluate the proposed classifier which uses ensemble classifier along with the boosting concept. Adaptive windowing is also used for handling the data stream. Empirical study will show that the proposed classifier takes less memor...
متن کاملFeature Based Data Stream Classification (FBDC) and Novel Class Detection
Data stream classification poses many challenges to the data mining community. Here this paper solves all the challenges such as infinite length, concept-drift, concept-evolution, and feature-evolution. Since a data stream is theoretically infinite in length, it is impractical to store and use all the historical data for training. Concept-drift is a common phenomenon in data streams, which occu...
متن کاملClassification and Novel Class Detection of Data Streams in a Dynamic Feature Space
Data stream classification poses many challenges, most of which are not addressed by the state-of-the-art. We present DXMiner, which addresses four major challenges to data stream classification, namely, infinite length, concept-drift, concept-evolution, and featureevolution. Data streams are assumed to be infinite in length, which necessitates single-pass incremental learning techniques. Conce...
متن کاملDecision Tree for Dynamic and Uncertain Data Streams
Current research on data stream classification mainly focuses on certain data, in which precise and definite value is usually assumed. However, data with uncertainty is quite natural in real-world application due to various causes, including imprecise measurement, repeated sampling and network errors. In this paper, we focus on uncertain data stream classification. Based on CVFDT and DTU, we pr...
متن کامل